Regularization Path Algorithm for Statistical Learning
Identifieur interne : 000879 ( France/Analysis ); précédent : 000878; suivant : 000880Regularization Path Algorithm for Statistical Learning
Auteurs : Zapien Karina [France]Source :
Descripteurs français
- mix :
English descriptors
- mix :
Abstract
The selection of a proper model is an essential task in statistical learning. In general, for a given learning task, a set of parameters has to be chosen, each parameter corresponds to a different degree of "complexity''. In this situation, the model selection procedure becomes a search for the optimal "complexity'', allowing us to estimate a model that assures a good generalization. This model selection problem can be summarized as the calculation of one or more hyperparameters defining the model complexity in contrast to the parameters that allow to specify a model in the chosen complexity class.
The usual approach to determine these parameters is to use a "grid search''. Given a set of possible values, the generalization error for the best model is estimated for each of these values. This thesis is focused in an alternative approach consisting in calculating the complete set of possible solution for all hyperparameter values. This is what is called the regularization path. It can be shown that for the problems we are interested in, parametric quadratic programming (PQP), the corresponding regularization path is piecewise linear. Moreover, its calculation is no more complex than calculating a single PQP solution.
This thesis is organized in three chapters, the first one introduces the general setting of a learning problem under the Support Vector Machines' (SVM) framework together with the theory and algorithms that allow us to find a solution. The second part deals with supervised learning problems for classification and ranking using the SVM framework. It is shown that the regularization path of these problems is piecewise linear and alternative proofs to the one of Rosset (2004) are given via the subdifferential. These results lead to the corresponding algorithms to solve the mentioned supervised problems. The third part deals with semi-supervised learning problems followed by unsupervised learning problems. For the semi-supervised learning a sparsity constraint is introduced along with the corresponding regularization path algorithm. Graph-based dimensionality reduction methods are used for unsupervised learning problems. Our main contribution is a novel algorithm that allows to choose the number of nearest neighbors in an adaptive and appropriate way contrary to classical approaches based on a fix number of neighbors.
The usual approach to determine these parameters is to use a "grid search''. Given a set of possible values, the generalization error for the best model is estimated for each of these values. This thesis is focused in an alternative approach consisting in calculating the complete set of possible solution for all hyperparameter values. This is what is called the regularization path. It can be shown that for the problems we are interested in, parametric quadratic programming (PQP), the corresponding regularization path is piecewise linear. Moreover, its calculation is no more complex than calculating a single PQP solution.
This thesis is organized in three chapters, the first one introduces the general setting of a learning problem under the Support Vector Machines' (SVM) framework together with the theory and algorithms that allow us to find a solution. The second part deals with supervised learning problems for classification and ranking using the SVM framework. It is shown that the regularization path of these problems is piecewise linear and alternative proofs to the one of Rosset (2004) are given via the subdifferential. These results lead to the corresponding algorithms to solve the mentioned supervised problems. The third part deals with semi-supervised learning problems followed by unsupervised learning problems. For the semi-supervised learning a sparsity constraint is introduced along with the corresponding regularization path algorithm. Graph-based dimensionality reduction methods are used for unsupervised learning problems. Our main contribution is a novel algorithm that allows to choose the number of nearest neighbors in an adaptive and appropriate way contrary to classical approaches based on a fix number of neighbors.
Url:
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Hal, to step Corpus: 000273
- to stream Hal, to step Curation: 000273
- to stream Hal, to step Checkpoint: 000505
- to stream Main, to step Merge: 000B01
- to stream Main, to step Curation: 000A93
- to stream Main, to step Exploration: 000A93
- to stream France, to step Extraction: 000879
Links to Exploration step
Hal:tel-00422854Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Regularization Path Algorithm for Statistical Learning</title>
<title xml:lang="fr">Algorithme de Chemin de Régularisation pour l'apprentissage Statistique</title>
<author><name sortKey="Karina, Zapien" sort="Karina, Zapien" uniqKey="Karina Z" first="Zapien" last="Karina">Zapien Karina</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID"><orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc><address><addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation><relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300317" type="direct"><org type="institution" xml:id="struct-300317" status="VALID"><orgName>Université du Havre</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct"><org type="institution" xml:id="struct-300318" status="VALID"><orgName>Université de Rouen</orgName>
<desc><address><addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct"><org type="department" xml:id="struct-301288" status="VALID"><orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect"><org type="institution" xml:id="struct-301232" status="VALID"><orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Le Havre</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université du Havre</orgName>
<placeName><settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:tel-00422854</idno>
<idno type="halId">tel-00422854</idno>
<idno type="halUri">https://tel.archives-ouvertes.fr/tel-00422854</idno>
<idno type="url">https://tel.archives-ouvertes.fr/tel-00422854</idno>
<date when="2009-07-09">2009-07-09</date>
<idno type="wicri:Area/Hal/Corpus">000273</idno>
<idno type="wicri:Area/Hal/Curation">000273</idno>
<idno type="wicri:Area/Hal/Checkpoint">000505</idno>
<idno type="wicri:Area/Main/Merge">000B01</idno>
<idno type="wicri:Area/Main/Curation">000A93</idno>
<idno type="wicri:Area/Main/Exploration">000A93</idno>
<idno type="wicri:Area/France/Extraction">000879</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Regularization Path Algorithm for Statistical Learning</title>
<title xml:lang="fr">Algorithme de Chemin de Régularisation pour l'apprentissage Statistique</title>
<author><name sortKey="Karina, Zapien" sort="Karina, Zapien" uniqKey="Karina Z" first="Zapien" last="Karina">Zapien Karina</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID"><orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc><address><addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation><relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300317" type="direct"><org type="institution" xml:id="struct-300317" status="VALID"><orgName>Université du Havre</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct"><org type="institution" xml:id="struct-300318" status="VALID"><orgName>Université de Rouen</orgName>
<desc><address><addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct"><org type="department" xml:id="struct-301288" status="VALID"><orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect"><org type="institution" xml:id="struct-301232" status="VALID"><orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Le Havre</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université du Havre</orgName>
<placeName><settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="mix" xml:lang="en"><term>dimensionality reduction</term>
<term>model selection</term>
<term>neighborhood graph</term>
<term>ranking</term>
<term>regularization path</term>
<term>sparsity</term>
</keywords>
<keywords scheme="mix" xml:lang="fr"><term>chemin de régularisation</term>
<term>classification</term>
<term>graphe de similarité</term>
<term>ordonnancement</term>
<term>parcimonie</term>
<term>réduction de dimension</term>
<term>sélection de modèle</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The selection of a proper model is an essential task in statistical learning. In general, for a given learning task, a set of parameters has to be chosen, each parameter corresponds to a different degree of "complexity''. In this situation, the model selection procedure becomes a search for the optimal "complexity'', allowing us to estimate a model that assures a good generalization. This model selection problem can be summarized as the calculation of one or more hyperparameters defining the model complexity in contrast to the parameters that allow to specify a model in the chosen complexity class.
The usual approach to determine these parameters is to use a "grid search''. Given a set of possible values, the generalization error for the best model is estimated for each of these values. This thesis is focused in an alternative approach consisting in calculating the complete set of possible solution for all hyperparameter values. This is what is called the regularization path. It can be shown that for the problems we are interested in, parametric quadratic programming (PQP), the corresponding regularization path is piecewise linear. Moreover, its calculation is no more complex than calculating a single PQP solution.
This thesis is organized in three chapters, the first one introduces the general setting of a learning problem under the Support Vector Machines' (SVM) framework together with the theory and algorithms that allow us to find a solution. The second part deals with supervised learning problems for classification and ranking using the SVM framework. It is shown that the regularization path of these problems is piecewise linear and alternative proofs to the one of Rosset (2004) are given via the subdifferential. These results lead to the corresponding algorithms to solve the mentioned supervised problems. The third part deals with semi-supervised learning problems followed by unsupervised learning problems. For the semi-supervised learning a sparsity constraint is introduced along with the corresponding regularization path algorithm. Graph-based dimensionality reduction methods are used for unsupervised learning problems. Our main contribution is a novel algorithm that allows to choose the number of nearest neighbors in an adaptive and appropriate way contrary to classical approaches based on a fix number of neighbors.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Haute-Normandie</li>
<li>Région Normandie</li>
</region>
<settlement><li>Le Havre</li>
<li>Rouen</li>
</settlement>
<orgName><li>Université de Rouen</li>
<li>Université du Havre</li>
</orgName>
</list>
<tree><country name="France"><region name="Région Normandie"><name sortKey="Karina, Zapien" sort="Karina, Zapien" uniqKey="Karina Z" first="Zapien" last="Karina">Zapien Karina</name>
</region>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/France/explor/LeHavreV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000879 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000879 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/France |area= LeHavreV1 |flux= France |étape= Analysis |type= RBID |clé= Hal:tel-00422854 |texte= Regularization Path Algorithm for Statistical Learning }}
This area was generated with Dilib version V0.6.25. |